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1  Summary  of  Progress 

The  overall  (three-year)  goal  of  the  proposed  work  is  to  develop  a  classical  density  functional  theory  (DFT) 
of  ions  and  charged  polymers  near  dielectric  interfaces  in  a  three-dimensional  system.  Over  the  3  years  of  the 
grant,  we  have  had  great  success  and  some  setbacks,  as  might  be  expected  for  any  research  project.  These 
include: 

•  The  biggest  success  was  publication  of  the  paper  describing  the  general  theory  in  Journal  of  Physi¬ 
cal  Chemistry  Letters.  This  is  the  big  theory  paper  to  come  out  of  this  project  and  represents  the 
most  important  goal  of  the  proposed  work  (i.e. ,  constructing  the  functional  itself).  In  the  year  since 
publication,  it  has  already  been  cited  3  times,  indicating  an  impact  of  the  work. 
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•  Numerical  implementation  of  an  effecient  algorithm  to  solve  the  dielectric  DFT  equations  is  still  being 
implemented  and  work  on  this  front  will  continue  beyond  the  completion  of  the  grant.  This  is  the 
biggest  setback  of  the  project.  The  finding  of  an  efficient  algorithm  to  numerically  solve  the  dielectric 
DFT  equations  was  much  more  difficult  than  aniticipated.  The  root  of  the  problem  is  that  the  reaction 
field  is  not  a  radial  interaction  between  ions.  Because  of  this,  fast  Fourier  transforms  (FFTs)  can  no 
longer  be  used  and  one  must  be  clever  in  making  the  program  scale  with  N  log  N  rather  than  N2  where 
N  is  the  number  of  grid  points  to  be  solved. 

•  On  the  other  hand,  one  great  numerical  success  was  in  implementing  fast  computational  methods  for 
the  reaction  in  3D.  Matt  Knepley  at  the  Computational  Institute  of  the  University  of  Chicago,  through 
a  subcontract  of  this  grant,  has  implemented  a  Fast  Multipole  Method  (FMM)  and  published  a  paper 
on  this  topic. 

•  An  exciting  offshoot  of  the  DFT  work  was  a  collaboration  with  Dezso  Boda  in  Hungary  that  resulted 
an  alternative  method  (not  using  DFT  at  all)  of  ions  at  dielectric  interfaces  that  we  developed.  The 
specific  goal  of  the  proposed  work  is  to  develop  a  DFT  of  ions  at  a  dielectric  interface  and  this  has 
been  done  (as  just  mentioned).  The  broader  goal  of  the  proposed  to  work  is  develop  methods  that 
any  scientist  can  use  when  working  with  ions  at  a  dielectric  interface,  especially  in  three  dimensional 
geometries.  We  have  started  to  develop  another  method  with  Dr.  Boda,  a  long-time  collaborator,  called 
the  Local  Equilibrium  Grand  Canonical  Monte  Carlo  (LE-GCMC).  Because  we  have  already  developed 
a  fast  method  to  include  dielectrics  in  Monte  Carlo,  this  method  will  allow  for  computing  ion  currents 
(currently  at  the  drift-diffusion  level,  but  not  limited  to  that)  in  complex  3D  geometries  that  will  be 
very  challenging  with  DFT.  The  grant  sponsored  Dr.  Boda’s  one  month  visit  to  the  Pi’s  institution  to 
work  on  this.  It  is  important  to  note  that  this  work  is  in  addition  to  the  DFT  development;  LE-GCMC 
came  out  of  discussions  that  Dr.  Boda  and  the  PI  had  in  discussing  the  dielectric  DFT.  Moreover,  Dr. 
Boda’s  visit  helped  greatly  in  moving  toward  an  efficient  algorithm  to  numerically  solve  the  dielectric 
DFT  equations. 

•  Another  offshoot  of  this  work  is  the  hope  of  coupling  the  new  dielectric  DFT  functional  to  experiments. 
This  work,  which  will  continue  beyond  the  completion  of  the  grant,  is  with  experimentalists  working 
on  nanofluidic  devices.  This  resulted  in  2  papers  (a  third  is  in  preparation),  including  one  in  the 
prestigeous  Nano  Letters. 

Below,  a  technical  summary  of  the  scientific  work  done  over  the  course  of  the  grant  is  given. 


2  Technical  Details  of  Dielectric  DFT  Implementation 

In  the  following  ions  are  in  dielectric  e  near  a  dielectric  interface  with  dielectric  constant  ewan-  The  interface  is 
assumed  to  be  smooth  and  hard  so  that  ions  (assumed  to  be  charged,  hard  spheres)  cannot  overlap  any  part  of 
the  wall  dielectric.  The  goal  is  write  down  an  approximate  free  energy  functional  FD  {{pu  (x) }]  to  determined 
the  ion  density  profiles  pi  (x)  for  all  the  ion  species  i.  This  is  done  by  finding  the  dielectric  component  of  the 
electrochemical  potential,  which  is  given  by  the  functional  derivative  of  FD  (i.e. ,  SFD /Spi  (x)).  The  total 
electrochemical  potential  (of  which  we  only  determine  one  component  in  the  proposed  work)  is  then  used  to 
find  the  equilibrium  density  profiles  pi  (x).  These  are  plotted  in  Fig.  1. 
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Figure  1:  Comparison  of  the  new  dielectric  DFT  (lines)  with  MC  simulations  (symbols)  for  five  different 
dielectric  constants  of  the  wall  (the  ions’  dielectric  constant  is  always  80).  Top  panel:  0.1  M  monovalent 
ions,  both  with  diameter  3  A.  Both  cation  and  anion  profiles  are  identical  in  this  case  so  only  one  is  shown. 
Bottom  panel:  1  M  divalent  cation  and  2  M  monovalent  anions,  both  with  diameter  3  A.  The  cation  profile 
is  shown. 


2.1  Perturbation  theory 

For  a  general  three  dimension  system,  the  interaction  potential  perturbation  technique  gives  that  the  dielec¬ 
tric  contribution  to  the  free  energy  functional  is  given  by 

Fd  [{ pk  (x)}]  =  ^YlJa  J  J  pi?  (x’ x';  “)  (x’ x ')  dxdx'da  (1) 

^•>3 

+  \STJ  j  Pi  W  i>u  (x’  x)  dx 


where  p;^1  (x,  x';  a)  is  the  two-body  distribution  function  for  an  ion  of  species  i  at  x  and  an  ion  of  species  j 
at  x'  for  the  interaction  potential 


VYi  (x> x ')  =  i’ij  (x, x)  +  o4ij  (x, x') 


(2) 


with  ipPj  (x,x')  the  Coulomb  interaction  and  ip?  (x,x')  the  dielectric  reaction  potential  (i.e. ,  the  potential 
felt  by  an  ion  with  valence  Zj  at  x'  produced  by  an  ion  with  valence  Z{  at  x).  When  the  system  has  a  planar 
dielectric  interface, 


iPij  (x>x')  « 


^  ^wall 
€  +  Cwau 


(3) 


(This  is  still  a  very  good  approximation  even  when  the  dielectric  interface  is  not  a  plane.)  Therefore,  we  can 
make  the  correspondence 


a  = 


£  -  ea 
e  +  £« 


(4) 


where  eQ  ranges  between  e  and  ewan .  (The  absolute  value  is  necessary  to  prevent  a  <  0  when  ewan  >  e.) 
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We  first  describe  the  two-body  distribution  function  in  terms  of  a  pair  correlation  function  g^  (x,  x';a): 

Pif  (x,  x';  a)  =  pi  (x;  a)  pj  (x';  a)  g{j  (x,  x';  a) .  (5) 

We  know  that 

9ij  (x,x';a)  =  0  (|x  -  x'|  <  +  Rj)  (6) 

when  the  ions  overlap  and  that  when  a  =  0  only  the  ion-ion  correction  /iff  (x,  x')  and  the  a  =  0  ion  profile 
near  the  wall  contribute.  Each  of  these  factors  is  approximated.  First,  we  assume  that  the  densities  pk 
change  linearly  with  a: 

pk  (x;  a)  =  (pk  (x)  -  /40>  (x))  a  +  p Jl0>  (x)  (7) 

where  pk  (x)  is  the  density  profile  we  aim  to  calculate  (a  =  1)  and  p^  (x)  is  the  density  profile  of  the 
unperturbed  system  in  the  absence  of  dielectric  interfaces  (a  =  0)  for  which  we  presume  there  is  a  good  DFT 
available.  This  linear  change  with  a  has  been  varified  in  the  Monte  Carlo  simulation  results  of  Fig.  1. 

Second,  we  assume  that  the  total  correlation  function  (TCF)  is  the  sum  of  TCFs,  one  from  an  ion  of 
species  i  at  x  and  one  from  its  image  charge,  with  a  residual  term  from  the  case  of  no  induced  charge: 

gij  (x,  x';  a)  —  1  «  /if/  (x,  x')  +  /if/11  (x')  +  h?  (x,  x';  a) .  (8) 

The  dielectric  TCF  is  due  to  ions  packing  around  the  image  charge  which  we  assume  (by  Debye-Htickel)  is 
proportional  to  the  magnitude  of  the  image  charge  and  is  otherwise  the  bulk  TCF: 


hf,  (x,  x';  a)  s=s  - — (x*,  x')  =  a  ■  sgn 
3  e  +  ewan  13  V 


£  ^wall 
C 


hu  (x*,x') 


where  the  location  of  the  image  charge  is  x*  (which  is  a  function  of  x).  Therefore, 

'  fcJT  (x,  X7)  +  hjf  (x,  x')  +  a  •  sgn  (f=^)  •  h$  (x* ,  x')  |x  -  x'|  >  R,  +  Rj 


gij  (x,  x7;  a)  -  1 


-1 


lx  —  x'|  <  Ri  +  Rj 


(9) 


•  (10) 


If  we  chose  hfjn  to  be  the  bulk  TCF  h^j  ,  then  the  total  charge  from  the  ions  around  the  central  ion 
would  not  necessarily  cancel  the  central  charge.  Therefore,  h1™  is  defined  as  a  rescaled  bulk  TCF : 


Xi  (x)  /i/ulk(|x-x'|)  if  |x-x'|  >  Rij 


-1 


otherwise 


(11) 


where 


with 


Xi  (x)  = 


Hjlii  (x) 


g%j  (x)  =  Zjep 


bath 

3 


'Ix-x'l  >Ri 


KT  (Ix-x'De 


-x'\)e-vr(*)/kTdx.'. 


(12) 

(13) 


qij  (x)  is  the  charge  of  species  j  around  an  ion  of  species  i  located  at  x.  By  the  overlap  condition  in  Eq.  (6) , 
we  then  require  that  /i/a11  and  h®  are  0  when  |x  —  x'|  <  Ri  +  Rj. 

The  uncharged  wall  TCF  /iff11  is  nothing  but  the  ion  profile  of  the  ions  when  a  =  0: 


(*') 


Ix-x'l  >  Ri  +  Rj 
0  |x  —  x'|  <  Ri  +  Rj. 


(14) 
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With  these  approximations, 

Fd  [{ pk  (x)}]  -  ^  J  Pi  (x)  (x, x)  dx 


E  /  J  J  [pi  M  “  Pi0)  (x))  a  +  Pi0)  (x)  (pj  (x')  “  pf  (x/))  a  +  pf  (x') 

i,3  J° 

€  ^wall  \ 


(15) 


x  K?  (x,  x')  +  hjf  (x')  +  a  •  sgn 


€  +  Cwall 


•  hP  (x*,  x')  +  1  ^  •  (x,  x')  dxdx1  da 


=  g  E  J  J  [pi  (x)  -  Pi  (X)J  ( Pi  (x')  -  Pj  (X')J  (hT  (x’ x')  +  1  +  K?n  (x'))  ipFj  (x,  x')  dxdx'  (16) 

ij  J 

+  ^  E  /  J  (pi  w _ pi0)  (x))  pf  (x'(  (hijn  (x’ x')  + 1  +  Kf  (x')) 4>ij  (X5 x0 dxd* 

ij  j 

+  \  E  I  f  (pj  (x0  -  pf 1  (x'))  p!0>  (x)  (^i°n  (x, x')  +  1  +  ^7/n  (x0)  ^  (x, x')  dxdx' 


h3 


+  ?£/  Pi  (x)  Pj  (x')  (x,  x')  +  1  +  hjf  (x'))  tpP  (x,  x')  dxdx' 


1  (  €.  Cwa.ll  \ 

+  o  SSn 


\  e  +  Cwall  /  z“ 


E  /  /  (pi  (x)  “  pf  (x))  (ft  (x')  -  pf  (x'))  hfj  (x*,  x')  ipP  (X,  x')  dxdx' 


E  /  /  (pi  (x)  “  di0>  (x))  pf  (x')  /ig  (x*,x')  IpPj  (x,  x')  dxdx' 


1  /  e  ^wall  \ 

+  a  sgn  “ee - 

6  \  ^  H”  ^wall  / 

.1  ( €  ^wall  \ 

6  H-  ^wall  J 


+  i  sg“  (rrS)  ?  /  /  A™  <x)  (x,)  '■« (: 

l,3 


(  [  (Pi  (x')  _  dj0>  (x'))  pf' ’  (x)  hP  (x*,  x')  V’y  (x,  x')  dxdx' 


x*,  x')  (x,  x')  dxdx 


Then 


M?  (x)  = 


SFd 
Spi  (x) 

qE/  (di  (x')  -  Pf  (x' 


"»j  (x,  x')  +  1  +  hjj  (x'))  ipP  (x,  x')  dx' 
+  ^  E  /  dE  (x')  (d‘°n  (x,  x')  +  1  +  ft)*/11  (x'))  ^  (x,  x')  dx' 

^  E  /  (pi  (^  ~  dj0>  (x'))  dg  (x*,  x')  tpP  (x,  x')  dx' 


sgn 


sgn 


£  ^wall 
C  “h  Cwall  / 

^  ^wall 
€  +  ^wall 


)?/ 


P  (x')  hP  (x*,x)  3pP  (x,  x)  dx 


(17) 


+  2  (X’X(  ' 
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This  can  be  rewritten  because  the  terms  without  the  correlation  function  are  solutions  to  Poisson  equations: 

P?  (x)  ~  j  (x0  +  (x0)  hi?  (x’  x0  (x’ x^  dx  (18) 

+  T^Zie^D  (x)  +  jUiec^  (x)  (19) 

+  J  Qft  (x')  +  \pT  (x'))  Kf  (x0  i> ij  (x,  x')  dx' 

+  sgn  (g  +  gWa“)  5Z  /  (x^  +  (x'))  (x*> x')  V’y  (x,  x)  dx 

+  ^i>U  (x,x) 

where  cpD  is  the  solution  of  the  Poisson  equation  with  the  ion  density  profiles  at  their  image  charge  locations 
and  with  corresponding  reduced  charge. 


2.2  Single  planar  dielectric  interface 

If  the  dielectric  boundary  is  a  single  planar  interface  at  x  =  0  with  dielectric  constant  e  where  the  ions  are 
and  ewau  behind  the  wall  (where  no  ions  are),  then 


(x> x)  =  'tpij  0  +  y  -  y',  z  -  z') 

_  ZiZje  6  ^wall  1 

4^ee0  e  +  ewan  ^  +  ^  +  ^  _  y/)2  +  ^  ^ 


(20) 

(21) 


where  x  =  (x,  y,  z )  and  similarly  for  x'.  The  densities  have  planar  symmetry  so  that  pj  (x') 


Pj  (a/).  Then 


because 


j  Pj  ( x> )  ( fj Kj1  (x,  x')  ip?  (x,  x ')  dy’dz^j  dx’ 

=  J  pj  (; x ')  j hlPn  ^a;  —  x' ,  \/  u2  +  v2  j  ip?  (x  +  x' ,  \/ u2  +  v2^j  dudv^j  dx1 
=  2n  f  pj  (a/)  (  f  why11  (x  —  x',  w)  tpy  (x  +  a/,  w)  dw  \  dx' 


ZiZje 2  e  -  ewan 

2ee0  e  +  ewaii 


Pj{x ')  /  (x  —  x',  w) 


w 


\j  {x  +  x')2  +  w2 


dw  dx' 


i’ij  (a>w)  = 


ZiZje  e  -  ew 


We  must  therefore  evaluate  the  integral 


hyn  (x  -  x',  w ) 


47ree0  e  +  ew  vV  +  w2  ' 


dw=  I  /i)°n 


\j  {x  +  x')2  +  W2  J\x+x'\ 


—  (x  +  a;')2  I  dv 


(22) 

(23) 

(24) 

(25) 

(26) 


where 


(27) 
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2.3  Ion  TCF 

For  the  ion  TCF,  in  Eq.  (11)  we  define 


hlon 

nij 


Xi  (x)  (|x  -  x'|)  if  |x-x'|  >  Rij 

—  1  otherwise. 


(28) 


Now  we  make  the  ansatz  that  the  bulk  TCF  has  the  form  of  a  Yukawa  potential  Y  ( r;a,b ),  similar  to  that 
of  the  linearized  Poisson-Boltzmann  solution,  but  in  this  case  the  coefficients  may  be  fit  to,  for  example,  the 
solution  of  the  nonlinearized  Poisson-Boltzmann  equation: 


/ikulk 


(r)  w  Hij  (s,  w)=Y  ( s ,  w;  cf,c%,Rij 


Vs2  +  w2 


exp 


Rij  —  V  s2  +  w2 


(29) 

(30) 


where  Rij  is  the  contact  radius  of  the  two  ion  species  and  the  cj!  are  constants  (either  from  LPB  or  fitted 
to  the  NPB  solution). 

Next  we  evaluate  Eq.  (26).  For  h'j' ,  it  is  — 1  when  |x  —  x'|  <  /?.,,■  and  the  decaying  function  Hij 
otherwise.  For  |x  —  x'|  <  Rjj,  we  must  have 

Rij  >  (x  -  x’)2  +  w 2  (31) 

=  (x  —  x’)2  +  v2  —  (x  +  x’)2  (32) 


or 


v2  <  Rfi  +  4xxr. 

LJ 


But,  v  is  also  bounded  below  because  w  >  0,  so  for  |x  —  x'|  <  Rtj  we  must  have 


Therefore,  if 


then 


Otherwise,  if 


then 


(a:  +  x')  <  v2  <  R~j  +  Axx' . 
\x  +  x'\  <  ^ R2j  +  4nJ, 


'  l^c+x'l 


h1^11  [x  -  x' ,  H  v2  -  (x  +  xf)  dv 


fVR  %+4xx'  n  OO 

/  dv  +  / 

>\x+x'\  J  yj  -\-4xx' 


Hij  I  x  —  x' ,  y  v2  —  (x  +  x')  )  dv 


=  \x  +  x'\  —  \  R?x  +  4xx' 


'  yj  +4xx' 


—  x' i  \J v2  —  (x  +  x')2^j  dv. 
\x  +  x'\  >  yj R^j  +  4xxf, 


/  x-x\Hv2  -  (x  +  x')  I  dv  —  /  (  x  —  x' i  y  v2  —  (x  +  x')  )  dv 

I\x+x'\  \  /  J\x+x'\ 


(33) 

(34) 

(35) 


(36) 

(37) 

(38) 

(39) 
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We  can  put  these  together  by  letting 


so  that 


( x ,  x')  =  max  |  \x  +  x'\ ,  +  Axx’ | 


(40) 


(41) 


This  last  integral  is  the  first  major  bottleneck  in  the  numerical  calculation  because  of  the  x  —  x’  and 
x  +  x'  dependence  of  the  integrand.  This  makes  it  nearly  impossible  to  do  in  O  ( N )  time  and  generally  in 
O  ( TV 2)  time.  O  (N2)  time  is  not  useful  for  practical  applications.  For  example,  the  profiles  in  Fig.  1  we 
done  using  N  =  128  grid  points  and  took  '2  minutes.  For  practical  problems,  at  least  N  =  1000  is  necessary, 
if  not  N  =  10000.  With  O  (N2)  time,  this  would  take  ~200  minutes  which  is  not  practical.  The  last  period 
of  the  grant  spent  significant  portions  of  time  to  evaluate  this  integral  efficiently,  using  various  techniques 
from  polynomial  interpolation  to  Gaussian  quadratures  but  none  have — so  far — been  efficient  enough  for  a 
practical  real-world  problem. 


2.3.1  Scaling  of  ion  TCF 

By  Eq.  (13), 


Qij  (x)  =  zjePj 


.bulk 


'|x-x'|>fl; 


h\°jn  (|x  —  x'|)  e_v?xt(x0/feTdx/ 


=  27 TZjepjUlk  f  e  Vj  (x)/kT  (  f  wHij  (x  —  x',  w)  dw  J  dx' 

J  \J  Mij(x— x')  J 


where 


M2-  (x  -  x')  =  max  j R 2-  -  (x  -  x')2  ,  o| 


(42) 

(43) 

(44) 


because  the  lower  bound  for  the  radial  coordinate  w  (in  cylindrical  coordinates)  is  R2j  —  (a;  —  x1)2  if 
\x  —  x'\  <  Rij  and  0  when  x  —  x'  is  outside  the  contact  sphere.  Then 


/ Mij  (x—x') 


=  AjeR‘lc* 


wHi  j  (x  —  x1 ,  w)  dw 


w 


J  Mij  (x  x')  ^x^x’f  +  W2 

The  integral  can  be  evaluated  with  the  substitution 


:  exp  - 


y/(x- x')2  +  w2 


dw. 


(45) 


v  =  y  (x  —  x')  +w2 


dv  = 


w 


\]{x  —  x')2  +  w 2 


dw. 


(46) 

(47) 


Then 


foo  1 

/  w  /  = 

J Mjj(i-i')  y  (a:  —  a;')2  + 

/■oo  / 


I2  -L  y;2 


/  max{Rij  ,\x— x'\} 

jj  (  max  {Rjj, 
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-  c2  exp 


exp^JJ^ 

{Rij,  \x  -  x'|} 


Ji 

c2 


so  that 


Qij 


0 x )  =  27r^epJbulkci-?'4-7e^'/^  J  e"^**  i*'VkT  exp  ^_max{^R^  ^1} 


,‘00 

i  , 

exp 


max |x  —  x7|} 


c2 


poo 

=  2nzjepYmc[jcijeR‘/c*  / 

J  Rj+x , 

=  2nzjep)*rk<$<%eRiR,2  /“exp  /_max(^R  ko  ~  ^0|}\  ^ 

•t  R,  V  / 


dx7 


where  is  the  x  location  of  the  wall  and 


X  Q  3/  X  yj 

distance  of  x  from  the  wall,  with  a  similar  definition  for  Xg. 
i  evaluate  max  [Rij,  |xg  —  Xg|},  consider  that  |xg  —  Xg|  <  Rij  is  equivalent  to 


is  the  _ _  ~  . _ 

To  evaluate  max  [Rij.  |xo 

Xg  -  Rij  <  Xg  <  Xg  +  Rij. 


For  the  ions,  x0  >  0  so  that 


rap(_fEaiMiU 

Jr j  V  c2  J 


Ru  \  fx° '  u 


-  exp  I  —  —J 

\  C /  J  max 


dx  q  + 


poo 


'  I 

:{x0-Rij,Rj}  J  xo+Rij 


exp 


Jr 


\  —  / 

.max{*0-«ii,flj}  f  Ug-x'A 

exp - Ti -  dxo 


rij 

c2 


=  exp 


R, 


_ij_ 

rij 

c2 


poo 

I  (xo  +  Rij  —  maxjxo  —  Rij,  Rj})  +  /  exp 

JRii 


r- max{xo— Rij  ,Rj}—x o 
/  Rj  —xq 


exT? 


I  dsr 


(48) 

(49) 

(50) 

(51) 

(52) 

(53) 

(54) 


(55) 


(56) 


9 


The  last  integral  can  be  evaluated  as  follows: 


Then 


^max{xo- Rij  ,Rj}—xo 


>  Rj-x o 


exp - —  ds' 


L,  -  o  exp  ( 

yJw)  ds' 

if  max  {xo  —  Rij,  Rj}  =  Xq  —  Rij 

(57) 

0 

otherwise 

lZ~Ri  -p  ( 

'  -i)d»' 

K  c2  / 

if  max  {xo  —  Rij,  Rj}  =  Xo  —  Rij 

(58) 

0 

otherwise. 

exp  - 


max  {Rij,  \x0  -  I } 


dx'r. 


=  exp  (  %  |  {x$  +  -  max  {x0  -  Rij,Rj})  +  cj  exp  ( -  La 


+cj 


exp  -  exp  (  -  H  )  if  max{i0  -  R,r  Rj}  =  x0  -  Rj 

0  otherwise 


(59) 

(60) 
(61) 


which  defines  qij  ( x )  through  Eq.  (52). 


2.4  Dielectric  TCF 

For  the  dielectric  TCF  hR ,  we  use  an  LPB  approach  around  each  surface  charge  element  a  at  s.  Specifically, 
consider  an  ion  of  species  i  at  x  that  induces  a  surface  charge  profile  <Tj  (s;  x)  for  s  on  the  dielectric  interface 
S.  Next,  we  assume  that  the  ions  accumulating  around  this  surface  charge  element  make  a  potential 


,  a,  (x)  Ui  (s;  x)  I  |s  -  x' 

<t>i  (s,  x  ;  x)  «  — — —  exp 


47T£o  |s  —  x'I  \  A 
where  Y  is  the  Yukawa  potential  with  characteristic  lengthscale  (the  Debye  length) 


A  = 


eeo  kT 


e2£,-*Math 


^ 3  3  r3 

and  with  an  as-yet-undetermined  coefficient  (x).  Because  the  LPB  is  additive, 


$i(x';x)=  /  cj)i  (s,  x' j  x)  ds 

Js 


ai  (x)  f  <7j( s;x) 


w  exP  - 


With  the  LPB  approximation, 


hfj  (x,x') 


47T60  Js  |S-X' 


~TcT^i  (x7;  x)  |x-x'|>i?j 


ds. 


0 


lx  -  x'I  <  R, 


(62) 

(63) 

(64) 

(65) 

(66) 
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In  the  planar  geometry, 


,  /'  <Tj  (s;  x)  (  |s-x'|\ 

=  7|=p1_ — —  J* 


=  a»  (x) 


1  ^  ^wall 

27T  e  +  ewan  47ree0 


oo  /»oo 


x0 


exp 


—  oo  «/  — oo 


r 

1/2 


-  (K)2  +  ( V '  -  Sy)2  +  (z'  -  SZ)2)  /A 

(xo  +  (2/  -  s!/)2  +  (*  “  sz)2)  7  (Vof  +  (2/  -  S1/)2  +  (*'  -  sz)2) 

This  double  integral  can  be  simplified: 

(Vo)2  +  W  ~  sv?  +  (z'  -  szf)  7  A 


(67) 


dsydsz. 


oo  /»oo 


x0 


exp 


—  oo  J  — oo 


oo  /»oo 


—  oo  J  — oo 

/»oo  p2n 
1 0  «/  0 


—duyduz 


(V  +  (2/  -  Sy)2  +  (^  -  sz)2)  7  (K)2  +  (2/  -  Syf  +  (2/  -  SZ)2)  7 

1  /2 

Xo  exp  -  (V0)2  +  u2y  +  ufj  /A 

(V  +  K  +  V  +  +  z )2)  7  (Vo)2  +  u2y  + 

_ v _ exp  [-  ((ij)2  +  e)‘/2/A 

+  (r  cos  (0)  +  Y )2  +  (r  sin  (0)  +  Z)2^  ^(x'0)2  +  r2) 


—  dsydsz 


n2n 


dO 


exp 


=  x0  r 

Jo  \  Jo 


dOdr 


1/2 


-  ( (so)2  +  r2  )  A 


(x(j  +  (r  cos  (9)  +  Y )2  +  (r  sin  (9)  +  Z)‘ 


3/2 


(Vo)2  +  r2) 


1/2 


where 


j,y  =  sy  —  y’  =  r  cos  (0) 


uz  =  sz  —  z’  =  r  sin  (0) 

with  y  =  y'  —  y  and  Z  =  z’  —  z.  With  the  substitution 


tan  (c)  = 


9  =  t  +  c 
a 


one  can  show  that 


[•ITT  /-X7T  _ 

J  f  (a sin  (9)  +  b cos  (9))  d8  =  J  f  y\/ a2  +  b2  cos  (r)J  dr 

=  2  J  f  (  \/a2  +  b2  cos  (r)^  dr. 


(68) 


(69) 


dr  (70) 


(71) 

(72) 


(73) 

(74) 


(75) 

(76) 
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Therefore, 


io  (cCq  +  r2  +  Y2  +  Z2  +  2r  {Y  cos  (6)  +  Zsin  (0)))3^2  Jo  (xq  +  r2  +  w2  +  2rw  cos  (r))3^2 

2  r  dr 

(*8  +  r*+«,>)Wc,  (i  +  jj^cosM)’ 


and  finally 

(x';x)  =  (x0,x'0,w) 


.  .  C  ^wall  Z{G 

=  ai  (x)  — - -A - ^0 

e  +  ewaii  47reeo 


all  47reeo  Jo  _|_  r2  _|_  w2^ 


2  I  r2  I  7/j2'\3/2  V  +  r2  + 


exp  -  ((x'p)2  +  r2)  7  /A 

(V0)2  +  r2)  7 


where 


r(p)  =  -  f 

77  Jo 


77  Jo  (1 +pcos  (r'))3,/2 

For  0  <  p  <  1,  the  function  /  (p)  can  be  well  approximated  by 

.  .  1  -  0.962  278  036p  +  0.675  857 675 p2  -  0.262  241 607p3 


0*0  =  51  /  Pf  (x')  [  h%  (x’ x0  ^  (x’ x^  )  da:' 


XJj  J  pt  0*0  (//  (x';  x)  ^5  (x’ x')  Jy'J2')  <**;' 


=  -27r^E^-  /  4V) 


I  (xo—x'0)2+w2>Ri 


(a’o,  Zq,  u>)  {x  +  x' ,  w)  dw  dx' .  (85) 


This  last  integral  is  the  second  major  bottleneck  in  the  numerical  calculation  because  of  the  x  and  x' 
dependence  of  the  integrand  and  because  for  each  x  and  x' .  (a’o,  x'0,  w)  must  be  calulated  in  another 

difficult  integral. 

2.4.1  Scaling  the  dielectric  TCF 

Like  with  the  ion  TCF,  we  require  charge  neutrality  of  the  ions  around  the  dielectric  induced  charge: 

— e  J  (s;  x)  ds  =  e  ^  Zjp^ath  J  h®  (x,  x')  e~Vj (x  VfcTdx'.  (86) 


For  the  planar  case,  we  know  that 


e  ai  (s;  x)  ds  =  Zie- 

Js  * 


From  above  we  have  that 


hy  (X:  x')  = 


-^$i(x';x)  |x-x'|  >  Rij 

0  x  —  x'  <  Ra 
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so  we  wind  up  with 


J  <7i  (s;  x)  ds  =  e  ^  -2jpbath  j  h®  (x,  x')e  yj(x  )/kTdx' 


=  -  —  V02obath 

hT  2-^1  3  V] 


[  <Fj  (x;;  x)  e~Vj^')/kT dx1 

J  |x— x'\>Ra 


2  r  oo  [• 


(xq —x'0'\2+w2>Ri 


w$i  (xq,  x'0,  w)  dwdx'0 


2  _  /*oo  /»oo 

=  -^£4W  .. 


Rj  J  Mi  j[x o  ®0) 


(xq,  x'0,  w)  dwdx'0 


e2  \  /'°°  Z100 

=  -2tt  —  E  /°iath  J  J  (xo,  rfwdx'o 


e2  rMij(x  o-x'0) 

+  27rT^EW  /  /  (x0,x'0,w)dwdx'0 

K1  ,  ‘  dfl;  JO 

J  J 


where 


(x0  -  x'0)  =  max  |i?y  -  (x0  -  x[,)2  ,  o|  . 


For  the  first  integral,  we  use  the  fact  that 


(x§  +  r2  +  w2)3/2  \Xq  +  r2  +  w2 


dw=  — 


to  determine  that 


2  /-oo  /-oo 

-  2nkf  E  W'  J  J  w<^i  (X°’ X'0’ w ) 


=-2^e 


2  /*  oo  /»oo  „ 

e  2  bath  /  /  N  e  —  ewall  Zje  /  X 

T^l^ZjPj  /  ai(X )— - 7 - x0  ~ 

kT  J  J  JR  e  +  ewaii  47ree0  d0 

3  J 


_  e 

=  -2?r^E 


3 


_  e 

=  -2?r^E 


z*oo  /»oo 

1  ,  x  e  -  ewan  2*e  / 

a*  (x)  — - "j -  /  r 

r  e  +  £waii  47reeo  jg 


di  (x)  - — ^^-—2—  /  exp  [— u/A]  dudx'0 
e  +  £waii  47reeo  Jx> 


d  r  exp  -  ((x'0)2  +  r2)  '  /A 

X°  ((x'o)2  +  r2)V2 

exp  -(V)2  +  r2)  7  /A 

- HT - -dxda 

(Vo)2  +  r2) 


3 


o  e  /  \  e  ewall  -v  2  b 


exp  (  — - -  dx  o 


-  -2t r— a  fx)  6 _ A2  V  ~2obathexD  I  — - 

-  Zn^  [x)  £  +  £wau  47re£o  LVj  exP  ^ 
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For  the  second  integral, 


2^E*Ma 


oo  rMij  (xo  — Xq) 


w$i  (xo,x'0>  w)  dwdx o 


0  ^  /  \  ^  ^wall 

=  Z7T-—ai  (x) - - - 

kT  e  +  ewan  47ree0 


x o  E  (xo) 


exp  -(V0)2+r2)  7  A 
((a;'0)2+r2)  ' 


drdx'0  (104) 


where 


A? Ao)  = 


fOO  aOO  r  rMij(x  o-Xq) 


/fl,  ao  Uo 


(a;2  +  r2  +  u>2)3/2  V  A  +  r2  +  ^2 


exp  -  ((A)2  +  r2)  7  /A 


(Vo)2  +  r2) 


which  must  be  determined  numerically.  This  is  simplified  because 


Mij  (xo  -  x'0)  =  [  lX°  <  Rij 

{  0  \x0-x'0\>Rij 


which  implies  that 


x0  —  Rij  <  x'0  <  x0  +  Ri 


fij  Ao)  = 


00  [  r\/Rij-{x o-^o)2 


'  max{ i?j  ,xq  —  Rij  }  J  0  1^0 

exp  -  ^(xg)2  +  r2)  1  /A 

((A)2+r2)  7 


(&o  +  r2  +  tu2)3/2  VA  +  r2  +  w2 


drdx'0. 


The  scaling  factor  is  then 


a  ^  _  2eeo  kT _ 1 _ 

g2  A2  2y  Pyath  exp  (- -  ao0  Ej  Pyath  Ay  Ao) 


2.5  Wall  component 


For  the  wall  component, 


E  /  Qft  (x')  +  (x'))  Kf  (x0  Ay  (x, x')  dx' 


'|x-x'|  >fli_ 


#//-,/  Pj  (x') 

pj  (X  )  .bath 


pbith--i )  Ai  (x-x')dx' 


E  /  pf  (x')  f^bih  }  -  Ay  (X>XVX' 


pf  (x') 

pj  (X  )  .bath 


—  1  Ay  (x,xVx'. 


die  (108) 
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The  first  intergal  is  the  solution  of  a  Poisson  equation  similar  to  (fin  above.  For  the  second  integral, 


1  ipij  (x,  x')  dx' 


W'tpij  (x,x' ,w)  dw  dx' 


-  1 


wipP  (x,  x' ,  w)  dw  dx' 


f  (0)  /  f\ 

Pj  0 ) 

.-bath 


-  1  j  (\/ Rlj  +  4x0x'0  -  (a:0  +  Sq)) 


dx' . 


(114) 

(115) 

(116) 
(117) 


2.6  Toward  an  efficient  numerical  implementation 

For  the  Journal  of  Physical  Chemistry  Letters  paper,  the  dielectric  DFT  equations  were  solved  using  a 
specially  written  program  developed  by  the  PI.  However,  this  code  is  very  inefficient,  not  practical  for  real 
problems,  and  not  accessible  to  other  scientists.  Therefore,  the  PI  rewrote  the  entire  DFT  code  (dielectric  and 
older  components)  using  the  platform  of  Mathematica.  Mathematica  was  chosen  because  it  is  widely  available 
for  all  computer  platforms  (Windows,  Mac,  Linux)  and  has  excellent  support  and  upgrades.  Mathematica 
also  includes  a  huge  number  of  built-in  functions  that  make  programming  significantly  easier. 

Since  the  end  of  Year  1,  much  time  has  been  spent  rewriting  the  Pi’s  original  DFT  code  (the  part  without 
the  dielectric  functional)  in  Mathematica.  This  work  is  nearly  done.  Part  of  the  grant  went  to  supporting 
an  undergraduate  student  (Jordan  Hoffmann,  Johns  Hopkins  University)  to  put  the  code  through  its  paces 
by  doing  calculations  on  nanofluidic  devices  (see  below).  This  very  rigorous  testing  proved  necessary,  with 
Jordan  finding  many  places  of  improvement  for  the  code  (e.g.,  systems  for  which  it  was  slow,  systems  for 
which  is  did  not  find  an  answer,  and  on  rare  occasions  systems  for  which  is  produced  an  incorrect  answer). 

The  continuing  work,  beyond  the  completion  of  the  grant,  is  the  deriving  and  implementation  of  an 
efficient  numerical  algorithm  for  the  bottleneck  areas  identified  above. 


3  Technical  Details  of  FMM  Implementation  (written  by  Dr.  Matt 
Knepley) 

3.1  FMM  Overview 

The  fast  multipole  method  (fmm)  is  an  algorithm  that  accelerates  the  application  of  certain  integral  oper¬ 
ators,  namely  those  satifying  the  Calderon- Zygmund  conditions.  After  discretization,  this  application  may 
be  expressed  in  the  form 

N 

H  u(xi)K(yj,Xi).  (118) 

i= 1 

Naively,  if  we  have  N  source  points  {xj}  for  the  field  u,  and  N  target  points  {yj}  for  the  field  /,  this 
summation  would  require  0(N 2)  operations.  FMMallows  us  to  reduce  the  computational  complexity  to 
O(N),  whic  is  essential  for  the  operation  of  our  DFT  solves.  This  acceleration  is  accomplished  by  using  a 
hierarchical  partition  of  space  into  a  tree  structure.  This  tree  structure  allows  efficient  queries  for  points 
separations  so  that  we  can  insert  approximations  for  the  kernel  action  at  appropriate  points. 

We  use  a  diagram  of  the  tree  structure  to  illustrate  the  whole  algorithm  in  one  picture  (Figure  2).  The 
importance  of  this  bird’s  eye  view  is  that  it  relates  the  algorithm  computations  to  the  data  structure  used 
by  our  FMM,  and  it  will  prove  to  be  very  useful  when  we  discuss  the  parallel  version  that  we  have  developed. 


15 


After  the  spatial  decomposition  stage,  the  FMM  can  be  summarized  in  three  stages:  upward  sweep, 
downward  sweep,  and  evaluation  step.  In  the  upward  sweep,  the  objective  is  to  build  the  multipole  expansions 
(me  s)  for  each  node  of  the  tree.  The  ME  s  are  built  first  at  the  deepest  level,  level  L,  and  then  translated  to 
the  center  of  the  parent  nodes.  This  is  illustrated  in  Figure  2  by  the  black  arrows  going  up  from  the  nodes 
on  the  left  side  of  the  tree  (the  process  is  of  course  performed  for  the  whole  tree).  Thus,  the  ME  s  at  the 
higher  levels  do  not  have  to  be  computed  from  the  particles,  they  are  computed  from  the  ME  s  of  the  child 
nodes.  In  the  downward  sweep  of  the  tree,  first  the  ME  s  are  transformed  into  local  expansions  (le  s)  for 
all  the  boxes  in  the  interaction  list  — a  process  represented  by  the  dashed  red-colored  arrows  in  Figure  2. 
For  a  given  cell,  the  interaction  list  corresponds  to  the  cells  of  the  same  level  that  are  in  the  far  field,  but 
which  are  not  in  the  interaction  list  of  its  parent  cell.  Once  the  ME  s  have  been  translated  into  LE  s,  the 
LE  s  of  upper  levels  are  translated  to  the  centers  of  child  cells,  and  their  influence  is  added  up  to  obtain  the 
complete  far  domain  influence  for  each  box  at  the  leaf  level  of  the  tree.  This  process  is  represented  by  the 
dashed  blue-colored  arrows  going  down  the  right  side  of  the  tree  in  Figure  2.  At  the  end  of  the  downward 
sweep ,  each  box  will  have  an  LE  that  represents  the  complete  far-held  for  the  box.  Finally,  at  the  evaluation 
step,  for  every  particle  on  every  node  at  the  deepest  level  of  the  tree,  the  total  held  is  evaluated  by  adding 
the  near-held  and  far-held  contributions.  The  near  held  of  the  particles  in  a  given  cell  is  obtained  by  directly 
computing  the  interactions  between  all  the  particles  in  the  near  domain  of  the  box.  The  far  held  of  the 
particles  is  obtained  by  evaluating  the  LE  of  the  box  at  each  particle  location. 

3.2  Parallel  Strategy 

PetFMM  is  a  parallel  implementation  of  the  FMM  algorithm  designed  to  be  portable,  extensible,  scalable, 
and  easily  maintained  [2].  The  code  base  is  quite  small,  and  the  parallel  execution  completely  reuses  the 
serial  code,  simplifying  testing  and  optimization.  The  goal  of  our  parallel  strategy  is  to  achieve  an  optimal 
distribution  of  the  computational  work  among  processes  at  runtime  with  a  minimal  communication  overhead, 
and  in  this  sense  is  dynamic  load  balancing. 

We  partition  work  according  to  subtrees  of  the  original  FMM  tree.  An  important  element  of  our  parallel 
strategy  is  that  partitioning  are  carried  out  automatically  by  an  optimization  tool,  without  intervention  of 
the  user.  This  is  an  alternative  to  the  popular  space-filling  curve  methods  for  parallel  partitioning,  which 
were  hrst  used  for  tree-codes  in  [7],  and  continue  to  be  the  prevalent  method  in  both  tree-codes  and  FMM 
implementations.  Experiments  reported  in  [6]  using  space-filling  curve  partitions  for  200,000  particles  in  12 
processors,  showed  that  elapsed  wall-clock  times  for  each  processor  varied  between  60  and  140  seconds  (as 
this  work  was  before  the  time  of  multi-core  processors,  we  keep  the  terminology  “processor”  rather  than 
“process”).  Thus,  these  experiments  provide  clear  evidence  that  a  uniform  data  partition  via  space- filling 
curve  methods  can  result  in  considerable  load  imbalance,  which  we  eliminate. 

Our  strategy  for  parallelization  provides  dynamic  load  balancing  through  an  optimization  procedure, 
based  on  a  model  of  work  and  communications  for  the  algorithm.  We  utilize  the  tree  structure  associated 
with  the  hierarchical  decomposition  of  the  domain  in  order  to  decompose  the  FMM  into  subtrees,  for  which 
the  computational  model  has  been  developed.  Notice  that  the  tree  structure  has  many  roles:  it  is  used  as  a 
space  partitioner  for  the  particles,  it  organizes  the  storage  for  the  multipole  expansions  and  local  expansions, 
and  it  indicates  the  relations  between  nodes  in  the  same  level  of  the  tree  (whether  two  nodes  are  from  the 
local  domain  list  or  the  interaction  list). 

We  apply  a  sub-division  of  the  computations  by  cutting  the  d-dimensional  tree  at  a  certain  level  k,  as 
shown  in  Figure  4.  This  procedure  creates  a  root  tree,  that  contains  the  Hrst  k  levels  of  the  original  tree, 
and  2dk  local  trees,  each  corresponding  to  one  of  the  lower  branches  of  the  original  tree.  One  key  point  when 
partitioning  the  tree  is  to  obtain  more  subtrees  than  the  number  of  available  processes,  so  that  work  can  be 
evenly  distributed  across  the  processes. 

In  the  bird’s  eye  view  of  the  whole  algorithm,  data  and  computations  were  related  to  nodes  of  the  tree 
structure.  When  partitioning  the  tree  representation  into  subtrees,  computations  that  require  data  from 
different  nodes  of  the  partitioned  tree  might  access  data  from  several  different  partitions.  If  this  is  the  case, 
communication  between  subtrees  will  happen  as  illustrated  in  Figure  3.  By  relating  computations  to  nodes 
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of  the  tree,  the  work  carried  out  by  each  subtree  and  the  communication  between  different  subtrees  can  be 
estimated,  which  is  then  used  to  optimally  distribute  the  subtrees  over  available  processes. 

In  order  to  assign  subtrees  to  processes,  we  build  a  graph  representation  from  the  partitioned  tree,  as 
illustrated  on  Figure  4.  The  graph  is  assembled  such  that  the  vertices  of  the  graph  correspond  to  the 
subtrees  and  the  edges  correspond  to  communication  between  subtrees.  Using  the  graph  representation,  we 
can  assign  weights  to  the  vertices  which  are  proportional  to  the  amount  of  computational  work  performed 
by  each  subtree,  and  assign  weights  to  the  edges  which  are  proportional  to  the  amount  of  communication 
that  happens  between  two  subtrees. 

The  load  balancing  in  the  parallel  algorithm  is  done  by  partitioning  the  weighted  graph  into  as  many  parts 
as  the  number  of  available  processes,  and  then  assigning  the  subtrees  to  the  processes  according  to  the  graph 
partitions.  The  problem  of  optimally  partitioning  the  graph,  such  that  the  partitions  are  well-balanced  and 
minimize  the  communication,  is  solved  by  the  graph  partitioning  tool  Par  Metis  [3,  4].  ParMetis  is  an  open 
source  graph  partitioning  tool,  and  is  widely  used  in  mesh  partitioning  applications.  Figure  5  demonstrates 
the  load  balancing  scheme  at  work.  In  an  example  computation,  1  million  particles  following  a  uniform 
distribution  have  been  placed  inside  a  square-shaped  domain  that  is  then  hierarchically  decomposed  into  a 
tree  representation;  the  depth  of  the  tree  before  partitioning  is  L  =  8.  The  tree  is  then  cut  at  level  k  =  4, 
resulting  in  256  subtrees  that  are  then  distributed  among  16  processes. 

3.3  Electrostatics 

In  preparation,  for  the  use  of  PetFMM  for  DFT  electrostatics,  we  have  developed  a  infrastructure  for  simple 
3D  electrostatics,  as  described  in  [13].  This  was  applied  to  a  few  large  bioelectrotatic  problems  in  order  to 
evaluate  accuracy  and  scalability. 

Kernels  for  both  single  and  double  layer  charge  densities  were  developed  for  both  multicore  CPUs,  such 
as  the  Intel  Nehalem,  and  GPUs,  such  as  the  NVIDIA  Tesla  1060C.  Our  novel  data-aware  queueing  system 
allows  the  execution  of  either  kernel  type  without  changing  the  underlying  FMM  code.  We  have  demonstrated 
good  scalability  for  N  >  1000  [14],  which  will  be  easily  satisfied  by  our  DFT  calculations  where  we  expect 
N  =  107. 

We  have  developed  a  new  PETSc  compatibility  layer  for  PetFMM  that  will  allow  easy  integration  into  the 
existing  DFT  code.  Input  and  output  data  are  organized  in  PETSc  Vec  objects,  and  MatShell  wrappers  for 
FMM  evaluation  allow  it  to  be  seamlessly  used  in  all  PETSc  solvers. 

To  demonstrate  the  scalability  on  large  problems,  we  use  collections  of  randomly  oriented  lysozyme 
molecules  arranged  on  a  regular  Cartesian  grid,  as  a  mimic  of  the  Brownian  dynamics  calculations  performed 
by  McGuffee  and  Elcock  [5]  at  each  time  step.  One  such  collection,  of  1000  proteins,  is  shown  in  Figure  6. 
The  surface  charge  density  for  an  isolated  lysozyme  molecule  is  plotted  in  Figure  7. 

The  largest  simulation  we  have  conducted  consists  of  10,648  molecules,  where  each  surface  was  discretized 
into  102,486  elements.  This  calculation,  which  models  more  than  20  million  atoms  and  possesses  over  one 
billion  unknowns,  required  only  approximately  one  minute  per  iteration  on  512  nodes.  The  results  of  a  more 
detailed  scalability  study  of  our  code  is  shown  in  Figure  8.  The  present  code  can  calculate  the  matrix-vector 
product  of  6.4  billion  elements  in  57  seconds.  This  amounts  to  a  performance  of  34.6  TFlops,  as  shown  in 
Table  1 


4  Implementation  of  the  FMM  method  (written  by  Dr.  Matt 
Knepley) 

Dr  Knepley  is  part  of  the  grant  through  a  subcontract.  His  role  is  to  help  develop  a  3D  version  of  the  dielectric 
DFT  code,  which  includes  the  components  without  dielectrics  first.  He  is  also  responsible  for  developing  fast 
algorithms  for  electrostatics  and  dielectrics  that  will  be  used  in  the  code.  He  has  made  progress  on  both 
fronts  in  Year  2. 
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Table  1:  Flops  calculation 


Description 

Equation 

Value 

Total  number  of  points 

Np 

6.4  x  10y 

Number  of  FMM  cells 

Ncell 

8s 

Points  per  cell 

Np/Nceu 

381 

Interaction  list 

Nlist 

27 

Operations  per  interaction 

K 

30 

Time  to  solution 

T 

57 

Total  number  of  operations 

KNpNlistNp/Ncell/T 

3.46  x  1013 

4.1  DFT  Development 

We  have  produced  a  three  dimensional  Density  Functional  Theory  (DFT)  code  capable  of  simulating  ionic 
liquids  in  generic  geometries  using  both  hard  sphere  and  electrostatic  interaction  potentials.  This  code 
follows  the  development  in  [9]  and  [10].  It  is  based  on  the  PETSc  library  [11],  and  in  particular  on  the 
ability  of  PETSc  to  manage  parallel  multidimensional  grids  and  Fast  Fourier  transforms  (FFTs).  With  these 
building  blocks,  we  could  create  a  efficient,  scalable  code  incorporating  the  key  computational  insights  for 
DFT,  detailed  below. 

The  most  important  piece  of  technology  introduced  in  our  3D  DFT  simulation  is  the  use  of  spectral 
quadrature,  which  enables  accurate  conservation  of  Rosenfeld’s  fundamental  measures,  as  well  as  fast  and 
accurate  evaluation  of  RFD  electrostatics.  In  one  dimension,  selection  of  a  quadrature  rule  which  accurately 
captures  the  volume  of  a  ball,  the  surface  of  a  sphere,  and  the  directional  average  over  the  sphere  is  straight¬ 
forward,  even  in  the  presence  of  discontinuities  in  the  field  being  integrated.  However,  in  three  dimensions, 
this  problem  is  intractable  and  discontinuities  in  the  integrand,  generally  coming  from  a  weighting  functions, 
can  result  in  0(  1)  errors  in  the  result.  The  method  of  spectral  quadrature  moves  the  integral  to  Fourier 
space,  using  L2  isometry,  where  both  integrands  are  smooth,  and  can  be  readily  integrated  using  traditional 
quadrature.  Then  the  result,  also  smooth,  is  accurately  moved  to  real  space  using  the  FFT. 

As  an  example,  when  we  evaluate  the  basis  vectors  na  used  in  Rosenfeld’s  formulation  of  the  hard 
sphere  interaction  potential,  we  must  convolve  the  density  with  a  discontinuous  weight  function.  Using  the 
FFT  produces  unacceptable  errors  and  non-conservation  of  the  fundamental  meaures.  Thus  we  use  analytic 
Fourier  transforms  of  the  weight  functions  in  order  to  stably  evaluate  each  na,  a  simple  example  of  spectral 
quadrature.  This  same  method  was  applied  to  generate  fast  and  accurate  evaluations  of  the  RFD  reference 
density  which  using  an  averaging  integral  with  a  discontinuous  weighting  function.  Moreover,  the  method 
of  spectral  quadrature  was  combined  with  a  precise  windowing  method  to  reduce  the  RFD  evaluation  time 
from  0(N2)  to  O (IV  log  AT),  reducing  runtime  from  weeks  to  hours. 

The  incorporation  of  the  RFD  model  for  local  electrostatic  correlations  [10]  as  necessitated  the  intro¬ 
duction  of  another  unknown  field,  F,  the  local  inverse  screening  length.  We  have  augmented  the  existing 
nonlinear  algebraic  solver  to  handle  multiple  fields.  We  have  also  introduced  a  nonlinear  preconditioning  step 
which  provides  a  local  approximate  solution  to  T  before  using  the  Picard  iteration  to  improve  the  density 
estimate.  This  was  shown  to  provide  much  better  convergence  for  the  overall  nonlinear  system. 

4.2  Electrostatics 

The  following  descriptions  of  the  algorithms  will  be  implemented  in  the  3D  DFT  code  by  Dr.  Knepley. 
They  are  generally  applicable  and  therefore  he  is  testing  them  on  other  systems  because  these  systems 
have  established  benchmarks  and  compute  faster  than  DFT.  In  this  way,  the  implementation  is  much  more 
efficient. 
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(b)  Gravity  field 


Figure  2:  Synthetic  model  used  thorough  out  the  numerical  experiments,  (a)  Domain  and  density  anomaly 
and  (b)  the  corresponding  analytic  gravity  field  gz  (mGal).  The  inclusion  is  indicated  by  the  transparent 
blue  cube.  See  text  for  dimensions  of  the  domain  and  density  anomaly. 


4.2.1  Using  FMM 

In  preparation,  for  the  use  of  PetFMM  for  DFT  electrostatics,  we  have  developed  an  infrastructure  for  simple 
3D  electrostatics,  as  described  in  [12]  and  [13].  This  was  applied  to  a  few  large  problems  of  gravity  inversion 
and  bioelectrotatics  in  order  to  evaluate  accuracy  and  scalability.  For  three  dimensions,  we  have  improved  the 
scalable  partitioning,  and  now  allow  fully  parallel  input  and  output  of  the  charge,  potential,  and  electric  field 
distribution.  In  Fig.  2,  we  show  a  density  anomaly  and  the  corresponding  gravity  field  generated  which  we 
used  to  benchmark  the  parallel  performace  of  PetFMM  in  three  dimensions.  In  Table  3,  we  see  that  PetFMM 
shows  good  strong  scaling  up  to  2000  processors.  In  the  coming  year,  we  hope  to  reduce  the  imbalance  due 
to  work  on  the  root  tree  to  enable  strong  scaling  to  20,000  cores. 

Kernels  for  both  single  and  double  layer  charge  densities  were  developed  for  both  multicore  CPUs,  such 
as  the  Intel  Nehalem,  and  GPUs,  such  as  the  NVIDIA  Tesla  1060C.  Our  novel  data-aware  queueing  system 
allows  the  execution  of  either  kernel  type  without  changing  the  underlying  FMM  code.  We  have  demonstrated 
good  scalability  for  N  >  1000  [14],  which  will  be  easily  satisfied  by  our  DFT  calculations  where  we  expect 
N^IO7. 

We  have  developed  a  new  PETSc  compatibility  layer  for  PetFMM  that  will  allow  easy  integration  into  the 
existing  DFT  code.  Input  and  output  data  are  organized  in  PETSc  Vec  objects,  and  MatShell  wrappers  for 
FMM  evaluation  allow  it  to  be  seamlessly  used  in  all  PETSc  solvers. 
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np 

n 

Lv 

M 

CPU  time  (sec) 

Efficiency 

8 

2 

4 

96 

3.9740e+02  (S) 

- 

16 

2.0950e+02  (S) 

95% 

32 

1.1086e+02  (S) 

90% 

64 

5.9088e+0I  (D) 

84% 

32 

3 

5 

192 

9.2118e+02  (S) 

64 

4.8627e+02  (D) 

95%,  - 

128 

2.5809e+02  (S) 

89%.  94% * 

256 

1.4380e+02  (D) 
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Figure  3:  Strong  scaling  of  PetFMM  on  CADMOS  BG/P.  The  times  reported  here  represent  the  total  time 
taken  to  perform  the  multipole  summation  (ParaFMMEvaluate).  (S)  denotes  -mode  SMP,  (D)  denotes  - 
mode  DUAL,  (V)  denotes  -mode  VN.  *  indicates  efficiency  was  computed  w.r.t  the  64  CPU  execution  time 
(Pi  =  64). 


4.2.2  BIBEE  Approximation 

In  preparation  for  the  calculations  necessary  for  dielectric  boundary  terms  in  DFT,  we  have  examined 
approximations  for  calculation  of  an  electrostatic  field  in  the  presence  of  a  dielectric  boundary.  The  BIBEE 
approximation  [15]  appears  to  be  an  effective  option  for  DT  since  it  relies  only  on  approximating  the  integral 
operator  in  the  boundary  integral  equation  (BIE)  formulation  of  electrostatics,  rather  than  specifics  of  the 
problem  setup  and  method  of  evaluation  as  in  the  Generalized  Born  (GB)  framework.  Moreover,  we  were 
able  to  prove  rigorous  upper  and  lower  bounds  for  the  BIBEE  approximation  [16],  as  opposed  to  GB. 

In  recent  work,  we  have  deepened  the  analysis  of  BIBEE  by  considering  the  case  of  a  spherical  inclu¬ 
sion  [17],  for  which  a  complete  analytic  solution  is  available.  This  led  to  the  construction  of  the  superior 
BIBEE/M  approximation  and  a  better  understanding  of  separable  approximations  for  potential  solutions 
with  a  dielectric  jump.  We  anticipate  that  this  scheme  can  be  directly  used  in  DFT  to  provide  good  approx¬ 
imations  to  the  long-range  electrostatic  potential  </>.  However,  it  also  seems  likely  that  this  approximation 
technique  can  be  leveraged  to  simplify  the  calculation  of  the  near-field  electrostatic  correlations  as  well. 

5  Local  Equilibrium  Grand  Canonical  Monte  Carlo 

This  work  is  in  collaboration  with  Dr.  Dezso  Boda  of  the  University  of  Pannonia,  Hungary.  It  stems  out  of 
discussions  that  we  had  about  the  dielectric  DFT  functional  on  a  grant-sponsored  trip  to  his  department. 
This  work  is  not  meant  to  replace  any  work  on  DFT,  but  rather  as  an  alternative  method  for  scientists  to 
use  if  the  DFT  is  not  appropriate  for  them. 

The  purpose  of  the  LE-GCMC  method  is  to  develop  a  well-defined  molecular  model  computed  by  Monte 
Carlo  (MC)  simulations  coupled  to  the  Nernst-Planck  (NP)  equation  of  electrodiffusion: 

ja(r)  = --^a(r)c“(r)V//*(r),  (119) 
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where  T  is  temperature,  k  is  Boltzmann’s  constant,  j“(r)  is  the  particle  flux  density  ( a  =  1, . . . ,  M  refers 
to  a  diffusing  species),  Da{ r)  is  the  diffusion  coefficient  profile,  c“(r)  is  the  density  (concentration)  profile, 
/x“(r)  =  +  zae$(  r)  is  the  electrochemical  potential  profile  with  /Zq  (r)  being  the  the  chemical  potential 

in  the  absence  of  an  applied  electric  field,  za  the  valence,  e  the  electronic  charge,  and  <f>(r)  the  mean  potential. 
MC  is  useful  for  ions  at  a  dielectric  interface  as  well  because  we  have  already  created  a  fast  method  to  include 
dielectrics  in  the  MC  and  therefore  including  it  in  the  method  outlined  below  is  straight-forward  and  will 
constitute  one  of  the  next  steps  of  the  project. 

The  LE-GCMC  is  a  big  step  forward  because  we  suggest  (for  the  first  time,  to  our  knowledge)  applying  MC 
simulations  locally  for  subvolumes  of  the  simulation  cell  that  are  assumed  to  be  in  local  equilibrium.  Because 
these  subvolumes  are  characterized  by  their  volumes  V),  chemical  potentials,  /jt ,  and  the  temperature,  T, 
they  represent  open  systems  and,  therefore,  the  native  ensemble  of  the  simulations  in  these  subvolumes  is 
the  GC  ensemble.  Therefore,  we  introduce  the  LE-GCMC  method  to  simulate  a  globally  non-equilibrium 
steady-state  system  with  spatially  varying  chemical  potential.  In  this  method,  we  apply  independent  particle 
insertion/deletion  steps  for  the  various  subvolumes  with  acceptance  probabilities 


pf  =  min 


1, 


JVf! 

(N?  +  X)! 


V*  exp 


kT  )}' 


(120) 


where  iV“  the  number  of  particles  of  species  a  in  subvolume  V)  before  insertion/deletion,  jif  is  the  chemical 
potential  of  species  a  in  this  subvolume,  and  %  =  1  for  insertion,  while  %  =  —  1  for  deletion  (Metropolis 
sampling).  Particle  displacements  from  subvolume  V)  to  subvolume  Vj  are  accepted  with  probability 


pf_>j  =  min 


1,  exp 


(121) 


The  mean  electrical  potential  profile  is  computed  in  the  simulation  “on  the  fly”  by  using  the  inserted  ions 
as  test  charges.  The  electrical  potentials  computed  at  the  position  of  the  inserted  ion  are  averaged  over  the 
subvolumes. 

A U  is  the  energy  change  associated  with  the  insertion/deletion/movement  of  an  ion  and  it  contains  the 
ion’s  interaction  with  an  applied  potential,  zae<bappl(r),  where  <f>appl(r)  is  the  solution  of  Laplace’s  equation 
with  the  prescribed  boundary  condition.  The  LE-GCMC  steps  are  coupled  only  through  temperature  and 
energy.  The  energy  change  A U  contains  not  only  the  interaction  energies  between  particles  in  subvolume  V), 
but  also  the  effect  of  particles  outside  this  subvolume.  The  interactions  with  these  particles  can  be  considered 
as  an  external  constraint  on  the  particles  in  subvolume  V). 

The  result  of  the  LE-GCMC  simulation  is  a  set  of  density  profiles,  {c1(r), . . . ,  cM(r)},  obtained  for  a  set 
of  chemical  potential  profiles  and  the  fixed  temperature,  {T, /r1(r), ..., /xM(r)}.  The  LE-GCMC  simulation, 
therefore,  provides  the  same  information  that  DFT  does  with  the  difference  that  it  uses  the  chemical  poten¬ 
tials  as  independent  variables  of  the  ensemble  instead  of  the  densities.  The  advantages  of  the  LE-GCMC 
technique  are  that  (1)  it  can  be  applied  to  three-dimensional  systems  with  a  wider  variety  of  geometries  and 
pair-potentials,  (2)  it  provides  exact  results  (apart  from  statistical  and  system  size  errors),  while  theories 
necessarily  contain  approximations,  and  (3)  it  provides  a  mean  electrical  potential  that  automatically  satis¬ 
fies  Poisson’s  equation,  because  Poisson’s  equation  is  true  in  every  sampled  configuration,  and  consequently, 
it  is  true  for  their  average  too.  The  iteration  procedure  given  in  this  paper  ensures  that  the  resulting  mean 
potential  satisfies  the  prescribed  boundary  condition. 

The  LE-GCMC  simulation  then  provides  the  density  profiles  for  given  chemical  potential  profiles.  Sub¬ 
stituting  these  quantities  and  the  diffusion  coefficient  profiles  (it  must  be  provided  by  the  user)  into  the  NP 
equation,  the  flux  density  can  be  calculated. 

There  is  no  guarantee  that  even  an  intelligent  initial  guess  for  the  chemical  potential  profiles  provides  flux 
densities  that  satisfy  the  continuity  equation.  Therefore,  we  compute  new  chemical  potential  profiles  that 
would  provide  flux  densities  that  satisfy  the  continuity  equation  using  the  density  profiles,  cf(n),  obtained 
from  the  LE-GCMC  simulation  in  the  nth  iteration.  These  new  chemical  potenial  profiles,  pf(n  +  1),  are 
used  in  the  LE-GCMC  simulations  in  the  next,  (n  +  l)tlr,  iteration.  The  heart  of  the  iteration  algorithm  can 
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be  given  as  follows.  The  divergence  theorem  for  the  ith  subvolume  is 


0=  [  V-j“(r )dV=  l  j“(r)  •  ds,  (122) 

JVi  JSi 

where  Si  denotes  the  surface  of  the  volume  element.  If  we  denote  the  faces  of  volume  element  V)  by  Sj:j  on 
which  the  densities,  chemical  potentials,  and  flux  densities  are  assumed  to  be  constant,  the  surface  integrals 
can  be  written  as  a  sum  over  these  faces: 


°=  E  j“(^)  •  n(5ij)ay, 

j,  SijeSi 


(123) 


where  n(,S,;?j  and  ai:j  denote  the  outward  normal  vector  and  area  of  face  Si:j ,  respectively.  Substituting  the 
NP  equation  for  j“  (Sg ) ,  we  obtain  the  equation  for  the  computation  of  the  chemical  potentials  for  the  next 
iteration: 

0=  E  A“c“(n)V/x“.(«+l)-ny  (124) 

j,  SijeSi 

where  £>?•  =  £>“(,%),  cfj  =  ca(Sg),  /xg-  =  Ha(Sij),  and  =  n(,%). 

Boundary  conditions  are  that  the  chemical  potential  is  /j,a'L  =  /xf ,L  +  zae$L  in  the  left  hand  side,  while 
it  is  /x“,R  =  /xf ,R  +  zae<&R  in  the  right  hand  side  bulk  for  species  a.  The  bulk  chemical  potentials,  /af ,L  and 
/xg’R  corresponding  to  the  prescribed  concentrations  0.1  and  1  M  were  calculated  with  the  Adaptive  GCMC 
method  of  Malasics  and  Boda.  The  electrical  potential  in  the  left  bath  is  chosen  to  be  <f>L  =  0,  so  the  value 
<1> R  gives  the  voltage. 

The  calculation  domain  is  divided  into  slabs  indexed  from  0  to  N  +1  and  centers  denoted  by  ay.  Boundary 
conditions  are  set  at  Xq  and  cjv+i-  The  applied  potential  is  a  linear  function  between  0  at  xq  and  4>R  at 
Xn+i  in  the  planar  geometry.  Chemical  potentials  and  densities  in  the  slabs  are  denoted  by  /xf  and  cf. 
Quantities  on  the  boundaries  of  neighboring  slabs  are  denoted  by  prime.  The  flux  density,  the  concentration, 
and  the  diffusion  coefficient  on  the  boundary  of  the  ith  and  ( i  +  l)th  slabs  are  denoted  by  j'f.  c'f,  and  D'f, 
respectively.  Eq.  124  can  be  written  for  the  xth  slab  as 


0  =  D'?cJ(n) 


<+i(n  +  l)  -nf{n  +  l) 


%i-\- 1 

v?(n+  1)  -  /xf_i(n  +  1) 

Xj,  Xi—  i 


(125) 


where  density  c'f  is  obtained  from  interpolation  between  cf  and  cf+1 .  The  boundary  conditions  are  /Xg  = 
/x“,L,  Mg+i  =  Cq  =  ca,L,  and  cg+1  =  c“,R.  This  yields  N  linear  equations  for  the  N  unknowns, 

{/xf(n  +  l),...,/x?(n+  1)}. 

We  have  done  an  implementation  of  the  procedure  for  a  one-dimensional  test  system  of  ions  diffusing 
through  a  membrane.  This  made  it  possible  to  make  a  direct  comparison  with  DFT-PNP  results.  The 
results  were  indistinguishable  after  just  3  or  4  iterations  (not  shown).  A  manuscript  of  the  work  was  recently 
published  in  Journal  of  Chemical  Theory  and  Computation. 

Currently  work  is  being  done  to  include  dielectric  interfaces.  This  is  straight-forward  since  Dr.  Boda’s 
MC  simulations  already  include  this  feature  due  to  a  long-time  collaboration  to  efficiently  include  it  in  the 
simulations.  This  should  prove  to  be  a  very  useful  route  for  many  scientists  to  include  dielectric  interfaces 
into  their  ion  current  calculations. 


6  Connecting  with  experiments  in  nanofluidic  devices 

Experiments  on  real-world  systems  is  the  ultimate  test  of  any  theory  and  part  of  this  year  was  designed  to 
interface  with  experimental  systems  that  could  be  used  to  test  the  real-world  applicability  of  the  dielectric 
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Figure  4:  Comparing  DFT  with  charge  inversion  streaming  currents  experiments.  The  solid  line  is  the  DFT 
result  and  the  symbols  are  the  experiments  of  van  der  Hayden  et  al.  (PRL,  96:224502)  .  The  dashed  line 
connects  the  symbols  and  is  meant  to  guide  the  eye.  The  change  of  sign  of  the  current  indicates  that  charge 
inversion  is  present,  (a)  CaCl2  concentration  is  increased,  (b)  CoSepCl3  concentration  is  increased. 


functional  (which  is  afterall  the  overarching  goal  of  producing  such  a  functional).  This  work  was  mainly  on 
nanofluidic  devices. 

This  work  is  in  collaboration  with  Prof.  Sumita  Pennathur  at  the  University  of  California  at  Santa 
Barbara.  The  grant  supported  trips  to  UCSB  to  discuss  the  work  and  the  one  paper  that  we  produced  last 
year  with  another  in  preparation. 

Fluidic  devices  can  be  fabricated  with  nanometer-scale  features  using  the  same  techniques  used  for  silicon 
semiconductors.  These  devices  hold  the  promise  to  analyze,  separate,  concentrate,  manipulate,  and  detect 
specific  molecules  with  exquisite  sensitivity  and  high  throughput.  Applications  include  DNA  sequencing, 
medical  testing,  and  biowarfare  defense.  These  applications  can  be  realized  because  the  fluids  are  confined 
to  slits  or  channels  whose  smallest  dimension  is  tens  of  nanometers  in  size  (with  the  other  dimensions  still 
macroscopic).  The  smaller  the  confining  direction  is,  the  more  that  surface  effects — which  are  negligible 
in  macroscopic  systems — come  to  dominate  over  normal  bulk  properties.  The  ionic  current  and  velocities 
through  these  nanoscale  electrokinetic  devices  is  then  defined  by  the  structure  of  the  electrical  double  layers 
at  the  device  walls.  Modeling  such  devices  then  requires  accurate — and  hopefully  fast-computing — theories 
to  calculate  the  double  layers.  DFT  is  such  a  theory  and  that  is  what  we  did  in  a  first-of-a-kind  application 
in  the  field  of  nanofluidics  [18].  In  that  paper,  we  examined  the  effect  of  finite  ion  size  in  nanofluidic 
devices  by  modeling  the  ions  as  charged,  hard  spheres.  We  showed  that  the  ion-ion  correlations  included  in 
DFT — but  absent  from  Poisson-Boltzmann  theory  and  many  of  its  generalizations — produce  a  wide  range  of 
counterintuitive  phenomena  (e.g.,  charge  inversion),  both  in  the  electrical  double  layers  at  device  walls  and 
in  the  resulting  qualitatively  different  pressure-driven  and  electro-osmotic  currents. 

We  also  reproduced  very  challenging  experimental  data  that  involved  charge  inversion.  Also  called  over- 
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(ct|  (C/m7) 

Figure  5:  Energy  conversion  efficiency  as  a  function  wall  surface  charge  for  ions  of  diameter  (in  mn)  0.3 
(magenta),  0.6  (green),  0.9  (gray),  1.2  (red),  and  1.5  (blue).  The  surface  charge  on  the  slit  walls  is  negative, 
the  slit  height  is  10  nm,  the  counterion  concentration  is  100  mM,  its  diffusion  coefficient  is  0.5  m2/s,  and 
the  co-ion  is  C1-.  The  range  of  efficiencies  for  each  surface  charge  is  bracketed  by  choosing  the  Stern  layer 
height  to  be  0  or  the  diameter  of  the  counterion. 


charging,  charge  inversion  occurs  when  more  counterions  are  adsorbed  on  a  charged  surface  than  is  necessary 
to  neutralize  the  surface  charge.  As  a  result,  a  layer  of  co-ions  is  adsorbed  behind  the  first  layer  of  counterions. 
When  enough  co-ions  form  this  second  layer,  the  mean  electrostatic  potential  can  change  sign  (compared  to 
the  potential  at  the  wall  surface) .  Since  this  potential  defines  the  velocity  of  the  ions  when  they  are  driven  by 
an  applied  voltage,  a  change  of  sign  indicates  a  reversal  of  velocity  and  this  produces  qualitatively  different 
results  than  a  potential  that  does  not  change  sign  (as  in  most  of  the  theories  used  on  nanofluidic  devices). 
DFT  is  one  of  the  few  theories  that  produces  charge  inversion  and  we  used  it  to  reproduce — for  the  first  time 
by  any  theory  that  we  are  aware  of — the  experiments  of  Cees  Dekker’s  lab  that  showed  charge  inversion  (Fig. 
4).  This  work  is  also  important  because  it  showed  that  DFT  can  reproduce  experimental  data  for  systems 
with  particles  that  are  trivalent  (+3  charge)  and  large  (9  A  in  diameter)  and  at  high  concentrations  (up  to 
1  M).  No  other  theory  like  Poisson-Boltzmann  we  are  aware  of  can  do  this. 

Another  nanofluidics  application  of  was  recently  published  in  the  prestigeous  Nano  Letters.  This  con¬ 
cerned  using  the  ion  correlations  in  DFT  to  produce  very  high  efficiency  in  the  energy  conversion  process 
from  pressure  to  voltage.  Ions  in  nanofluidic  devices  can  be  moved  either  by  applying  pressure  or  by  ap¬ 
plying  a  voltage.  When  moved  with  pressure,  the  ions  produce  a  streaming  potential  and  this  can  be  used 
to  drive  an  electrical  system.  The  efficiency  of  this  process  was  always  described  a  being  very  low,  but  this 
theoretical  work  did  not  include  the  effect  of  the  ions’  size.  With  DFT  we  included  this  an  showed  that  very 
high  efficiencies  (>50%)  were  possible,  in  principle,  and  >30%  in  devices  that  are  fabricated  today.  This  is 
shown  in  Fig.  5. 
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